{
   "cells": [
      {
         "cell_type": "markdown",
         "id": "9c48213d-6e6a-4c10-838a-2a7c710c3a05",
         "metadata": {},
         "source": [
            "# Simple Index Demo + ChatGPT"
         ]
      },
      {
         "cell_type": "markdown",
         "id": "e34da56e-bc3b-433e-b65c-96edea4db5dd",
         "metadata": {},
         "source": [
            "Use a very simple wrapper around the ChatGPT API"
         ]
      },
      {
         "cell_type": "markdown",
         "id": "50d3b817-b70e-4667-be4f-d3a0fe4bd119",
         "metadata": {},
         "source": [
            "#### Load documents, build the GPTSimpleVectorIndex"
         ]
      },
      {
         "cell_type": "code",
         "execution_count": 7,
         "id": "690a6918-7c75-4f95-9ccc-d2c4a1fe00d7",
         "metadata": {
            "tags": []
         },
         "outputs": [
            {
               "name": "stderr",
               "output_type": "stream",
               "text": [
                  "/Users/jerryliu/Programming/gpt_index/.venv/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
                  "  from .autonotebook import tqdm as notebook_tqdm\n"
               ]
            }
         ],
         "source": [
            "import logging\n",
            "import sys\n",
            "\n",
            "logging.basicConfig(stream=sys.stdout, level=logging.INFO)\n",
            "logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))\n",
            "\n",
            "from gpt_index import GPTSimpleVectorIndex, SimpleDirectoryReader, LLMPredictor, ServiceContext\n",
            "from langchain.chat_models import ChatOpenAI\n",
            "from IPython.display import Markdown, display"
         ]
      },
      {
         "cell_type": "code",
         "execution_count": 8,
         "id": "03d1691e-544b-454f-825b-5ee12f7faa8a",
         "metadata": {
            "tags": []
         },
         "outputs": [],
         "source": [
            "# load documents\n",
            "documents = SimpleDirectoryReader('../paul_graham_essay/data').load_data()"
         ]
      },
      {
         "cell_type": "code",
         "execution_count": null,
         "id": "6cc980e8-f4e1-4fad-93f8-ab1bbaa874f3",
         "metadata": {
            "tags": []
         },
         "outputs": [
            {
               "name": "stdout",
               "output_type": "stream",
               "text": [
                  "INFO:gpt_index.token_counter.token_counter:> [build_index_from_documents] Total LLM token usage: 0 tokens\n",
                  "> [build_index_from_documents] Total LLM token usage: 0 tokens\n",
                  "INFO:gpt_index.token_counter.token_counter:> [build_index_from_documents] Total embedding token usage: 18579 tokens\n",
                  "> [build_index_from_documents] Total embedding token usage: 18579 tokens\n"
               ]
            }
         ],
         "source": [
            "# LLM Predictor (gpt-3.5-turbo) + service context\n",
            "llm_predictor = LLMPredictor(llm=ChatOpenAI(temperature=0, model_name=\"gpt-3.5-turbo\"))\n",
            "service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, chunk_size_limit=512)"
         ]
      },
      {
         "cell_type": "code",
         "execution_count": null,
         "id": "ad144ee7-96da-4dd6-be00-fd6cf0c78e58",
         "metadata": {
            "scrolled": true,
            "tags": []
         },
         "outputs": [],
         "source": [
            "index = GPTSimpleVectorIndex.from_documents(documents, service_context=service_context)"
         ]
      },
      {
         "cell_type": "markdown",
         "id": "b6caf93b-6345-4c65-a346-a95b0f1746c4",
         "metadata": {},
         "source": [
            "#### Query Index"
         ]
      },
      {
         "cell_type": "markdown",
         "id": "83e2905e-3789-4793-82b9-0ac488246824",
         "metadata": {},
         "source": [
            "By default, with the help of langchain's PromptSelector abstraction, we use \n",
            "a modified refine prompt tailored for ChatGPT-use if the ChatGPT model is used."
         ]
      },
      {
         "cell_type": "code",
         "execution_count": 5,
         "id": "85466fdf-93f3-4cb1-a5f9-0056a8245a6f",
         "metadata": {
            "scrolled": true,
            "tags": []
         },
         "outputs": [
            {
               "name": "stderr",
               "output_type": "stream",
               "text": [
                  "\n",
                  "KeyboardInterrupt\n",
                  "\n"
               ]
            }
         ],
         "source": [
            "response = index.query(\n",
            "    \"What did the author do growing up?\", \n",
            "    service_context=service_context,\n",
            "    similarity_top_k=3\n",
            ")"
         ]
      },
      {
         "cell_type": "code",
         "execution_count": 6,
         "id": "bdda1b2c-ae46-47cf-91d7-3153e8d0473b",
         "metadata": {
            "tags": []
         },
         "outputs": [
            {
               "data": {
                  "text/markdown": [
                     "<b>Before college, the author worked on writing essays and programming. They wrote short stories and essays on various topics. They also tried programming on an IBM 1401 in 9th grade using an early version of Fortran. In college, the author studied painting and art history and spent a year in Florence, Italy, where they painted portraits of people and still life. After college, the author worked on software development and co-founded a startup called Viaweb. Later, the author spent three months writing essays in 2015 before returning to work on Bel, a programming language they had been developing for years. The author worked intensively on Bel, often having chunks of the code in their head and working on it while watching their children play. Most of Bel was written while the author was living in England, where they moved with their family in 2016. In the fall of 2019, Bel was finally finished, and the author resumed writing essays and thinking about other projects.</b>"
                  ],
                  "text/plain": [
                     "<IPython.core.display.Markdown object>"
                  ]
               },
               "metadata": {},
               "output_type": "display_data"
            }
         ],
         "source": [
            "display(Markdown(f\"<b>{response}</b>\"))"
         ]
      },
      {
         "cell_type": "code",
         "execution_count": null,
         "id": "ec88df57",
         "metadata": {
            "tags": []
         },
         "outputs": [],
         "source": [
            "response = index.query(\n",
            "    \"What did the author do during his time at RISD?\", \n",
            "    service_context=service_context,\n",
            "    similarity_top_k=5\n",
            ")"
         ]
      },
      {
         "cell_type": "code",
         "execution_count": 9,
         "id": "67e8e675-1b03-423a-b53e-23ab278ba03b",
         "metadata": {
            "tags": []
         },
         "outputs": [
            {
               "data": {
                  "text/markdown": [
                     "<b>The author attended RISD to learn how to paint and took a color class there. However, he mostly taught himself to paint and dropped out in 1993. He then moved to New York City to pursue his career as an artist.</b>"
                  ],
                  "text/plain": [
                     "<IPython.core.display.Markdown object>"
                  ]
               },
               "metadata": {},
               "output_type": "display_data"
            }
         ],
         "source": [
            "display(Markdown(f\"<b>{response}</b>\"))"
         ]
      },
      {
         "cell_type": "markdown",
         "id": "88ca1808-d112-4c28-b110-b65dcc9b7207",
         "metadata": {},
         "source": [
            "**Refine Prompt**: Here is the chat refine prompt "
         ]
      },
      {
         "cell_type": "code",
         "execution_count": 11,
         "id": "2f0c270d-9de5-40bf-88fc-83a360523db0",
         "metadata": {
            "tags": []
         },
         "outputs": [],
         "source": [
            "from gpt_index.prompts.chat_prompts import CHAT_REFINE_PROMPT"
         ]
      },
      {
         "cell_type": "code",
         "execution_count": 22,
         "id": "4db38651-9790-4a61-ac3d-689ce6dfa369",
         "metadata": {
            "tags": []
         },
         "outputs": [
            {
               "data": {
                  "text/plain": [
                     "{'input_variables': ['context_msg', 'query_str', 'existing_answer'],\n",
                     " 'output_parser': None,\n",
                     " 'partial_variables': {},\n",
                     " 'messages': [HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['query_str'], output_parser=None, partial_variables={}, template='{query_str}', template_format='f-string', validate_template=True), additional_kwargs={}),\n",
                     "  AIMessagePromptTemplate(prompt=PromptTemplate(input_variables=['existing_answer'], output_parser=None, partial_variables={}, template='{existing_answer}', template_format='f-string', validate_template=True), additional_kwargs={}),\n",
                     "  HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context_msg'], output_parser=None, partial_variables={}, template=\"We have the opportunity to refine the above answer (only if needed) with some more context below.\\n------------\\n{context_msg}\\n------------\\nGiven the new context, refine the original answer to better answer the question. If the context isn't useful, output the original answer again.\", template_format='f-string', validate_template=True), additional_kwargs={})]}"
                  ]
               },
               "execution_count": 22,
               "metadata": {},
               "output_type": "execute_result"
            }
         ],
         "source": [
            "dict(CHAT_REFINE_PROMPT.prompt)"
         ]
      },
      {
         "cell_type": "markdown",
         "id": "6cb664e8-f53f-4d6c-a086-1f2784cc1dc8",
         "metadata": {},
         "source": [
            "#### Query Index (Using the standard Refine Prompt)\n",
            "\n",
            "If we use the \"standard\" refine prompt (where the prompt is one text template instead of multiple messages), we find that the results over ChatGPT are worse. "
         ]
      },
      {
         "cell_type": "code",
         "execution_count": 27,
         "id": "29c416f8-d5ab-47d6-8b16-f615bfa58219",
         "metadata": {
            "tags": []
         },
         "outputs": [],
         "source": [
            "from gpt_index.prompts.default_prompts import DEFAULT_REFINE_PROMPT"
         ]
      },
      {
         "cell_type": "code",
         "execution_count": null,
         "id": "3df1acc4-735a-48ac-9fb4-73d9d7eabc02",
         "metadata": {
            "tags": []
         },
         "outputs": [],
         "source": [
            "response = index.query(\n",
            "    \"What did the author do during his time at RISD?\", \n",
            "    service_context=service_context,\n",
            "    refine_template=DEFAULT_REFINE_PROMPT,\n",
            "    similarity_top_k=5\n",
            ")"
         ]
      },
      {
         "cell_type": "code",
         "execution_count": 31,
         "id": "b8938077-6527-4008-8d0c-af7a8178ff10",
         "metadata": {
            "tags": []
         },
         "outputs": [
            {
               "data": {
                  "text/markdown": [
                     "<b>\n",
                     "\n",
                     "The existing answer is not relevant to the new context provided and therefore the original answer remains sufficient. The author dropped out of RISD in 1993 and moved to New York to pursue painting.</b>"
                  ],
                  "text/plain": [
                     "<IPython.core.display.Markdown object>"
                  ]
               },
               "metadata": {},
               "output_type": "display_data"
            }
         ],
         "source": [
            "display(Markdown(f\"<b>{response}</b>\"))"
         ]
      },
      {
         "cell_type": "markdown",
         "id": "2e024521-97b5-417f-8c27-950983f52cda",
         "metadata": {},
         "source": [
            "### [Beta] Use ChatGPTLLMPredictor\n",
            "\n",
            "Very simple GPT-Index-native ChatGPT wrapper. Note: this is a beta feature. If this doesn't work please\n",
            "use the suggested flow above."
         ]
      },
      {
         "cell_type": "code",
         "execution_count": 7,
         "id": "a49d9a1b-21fb-4153-ad24-191a13513d64",
         "metadata": {
            "tags": []
         },
         "outputs": [],
         "source": [
            "# use ChatGPT [beta]\n",
            "from gpt_index.llm_predictor.chatgpt import ChatGPTLLMPredictor\n",
            "from langchain.prompts.chat import SystemMessagePromptTemplate\n",
            "\n",
            "llm_predictor = ChatGPTLLMPredictor()"
         ]
      },
      {
         "cell_type": "code",
         "execution_count": null,
         "id": "596af2aa-7ddf-41f2-801b-4a24a4980dd8",
         "metadata": {
            "tags": []
         },
         "outputs": [],
         "source": [
            "response = index.query(\n",
            "    \"What did the author do during his time at RISD?\", \n",
            "    service_context=service_context\n",
            ")"
         ]
      },
      {
         "cell_type": "code",
         "execution_count": 9,
         "id": "771e20ba-ccba-447e-89d6-8d731accc6f3",
         "metadata": {
            "tags": []
         },
         "outputs": [
            {
               "data": {
                  "text/plain": [
                     "'Arrr, the scallywag went to RISD and had to do the foundation classes in fundamental subjects like drawing, color, and design.'"
                  ]
               },
               "execution_count": 9,
               "metadata": {},
               "output_type": "execute_result"
            }
         ],
         "source": [
            "str(response)"
         ]
      },
      {
         "cell_type": "code",
         "execution_count": null,
         "id": "829bb58d-910e-4a19-bbea-8a9546d24b92",
         "metadata": {},
         "outputs": [],
         "source": []
      }
   ],
   "metadata": {
      "kernelspec": {
         "display_name": "llama_index",
         "language": "python",
         "name": "llama_index"
      },
      "language_info": {
         "codemirror_mode": {
            "name": "ipython",
            "version": 3
         },
         "file_extension": ".py",
         "mimetype": "text/x-python",
         "name": "python",
         "nbconvert_exporter": "python",
         "pygments_lexer": "ipython3",
         "version": "3.10.10"
      }
   },
   "nbformat": 4,
   "nbformat_minor": 5
}